25 research outputs found

    A data driven equivariant approach to constrained Gaussian mixture modeling

    Full text link
    Maximum likelihood estimation of Gaussian mixture models with different class-specific covariance matrices is known to be problematic. This is due to the unboundedness of the likelihood, together with the presence of spurious maximizers. Existing methods to bypass this obstacle are based on the fact that unboundedness is avoided if the eigenvalues of the covariance matrices are bounded away from zero. This can be done imposing some constraints on the covariance matrices, i.e. by incorporating a priori information on the covariance structure of the mixture components. The present work introduces a constrained equivariant approach, where the class conditional covariance matrices are shrunk towards a pre-specified matrix Psi. Data-driven choices of the matrix Psi, when a priori information is not available, and the optimal amount of shrinkage are investigated. The effectiveness of the proposal is evaluated on the basis of a simulation study and an empirical example

    Chapter Profiling visitors of a national park in Italy through unsupervised classification of mixed data

    Get PDF
    Cluster analysis has for long been an effective tool for analysing data. Thus, several disciplines, such as marketing, psychology and computer sciences, just to mention a few, did take advantage from its contribution over time. Traditionally, this kind of algorithm concentrates only on numerical or categorical data at a time. In this work, instead, we analyse a dataset composed of mixed data, namely both numerical than categorical ones. More precisely, we focus on profiling visitors of the National Park of Majella in the Abruzzo region of Italy, which observations are characterized by variables such as gender, age, profession, expectations and satisfaction rate on park services. Applying a standard clustering procedure would be wholly inappropriate in this case. Therefore, we hereby propose an unsupervised classification of mixed data, a specific procedure capable of processing both numerical than categorical variables simultaneously, releasing truly precious information. In conclusion, our application therefore emphasizes how cluster analysis for mixed data can lead to discover particularly informative patterns, allowing to lay the groundwork for an accurate customers profiling, starting point for a detailed marketing analysis

    Waste Management Analysis in Developing Countries through Unsupervised Classification of Mixed Data

    No full text
    The increase in global population and the improvement of living standards in developing countries has resulted in higher solid waste generation. Solid waste management increasingly represents a challenge, but it might also be an opportunity for the municipal authorities of these countries. To this end, the awareness of a variety of factors related to waste management and an efficacious in-depth analysis of them might prove to be particularly significant. For this purpose, and since data are both qualitative and quantitative, a cluster analysis specific for mixed data has been implemented on the dataset. The analysis allows us to distinguish two well-defined groups. The first one is poorer, less developed, and urbanized, with a consequent lower life expectancy of inhabitants. Consequently, it registers lower waste generation and lower C O 2 emissions. Surprisingly, it is more engaged in recycling and in awareness campaigns related to it. Since the cluster discrimination between the two groups is well defined, the second cluster registers the opposite tendency for all the analyzed variables. In conclusion, this kind of analysis offers a potential pathway for academics to work with policy-makers in moving toward the realization of waste management policies tailored to the local context

    Relationships between Renewable Energy Consumption, Social Factors, and Health: A Panel Vector Auto Regression Analysis of a Cluster of 12 EU Countries

    No full text
    One of the key indicators of a population’s well-being and the economic development of a country is represented by health, the main proxy for which is life expectancy at birth. Some factors, such as industrialization and modernization, have allowed this to improve considerably. On the other hand, along with high global population growth, the factor which may jeopardize human health the most is environmental degradation, which can be tackled through the transition to renewable energy. The main purpose of our study is to investigate the relationship between renewable energy consumption, social factors, and health, using a Panel Vector Auto Regression (PVAR) technique. We explore the link between some proxy variables for renewable energy consumption, government policy, general public awareness, the market, lobbying activity, the energy dependence on third countries, and health, spanning the period from 1990 to 2015, for a cluster of 12 European countries characterized by common features. Specifically, our analysis shows the importance of having a stringent policy for the development of renewable energy consumption and its influence over other social factors, rather than the existence of causal relationships between health and renewable energy consumption for the analyzed countries. This kind of analysis has a great potential for policy-makers. Further, a deeper understanding of these relationships can create a more effective decision-making process

    Simultaneous inference on diversity of biological communities

    No full text

    Towards a robust baseline for long-term monitoring of Antarctic coastal benthos

    No full text
    The Southern Ocean represents one of the world regions most sensitive to warming and there is an urgent need for quantitative data to understand changes in coastal communities. This goal can be achieved through the establishment of permanent monitoring sites and robust sampling designs. In this study, we used an emerging, photogrammetry-based technique to simulate a pilot study and test the efficiency of different sampling schemes (Simple Random\u2014SRS-, Systematic\u2014SyS- and Strip\u2014SS-) for estimating the abundances of megabenthic taxa. For taxa showing an aggregated distribution, we also applied an adaptive cluster sampling (ACS) design. In almost the totality of cases, the best accuracy of estimates was achieved with SyS combined with plots of 0.0625 m2. ACS design gave better performances but required a calibration of both the initial sample size and the threshold value to increase efficiency. The \u2018one-size-fits-all\u2019 1 m2 plot size never emerged as the best in any sampling schemes, hence the previously published literature data can be biased. This study represents a fine-scale reference baseline for the study area and the simulations performed will be pivotal in establishing sound-monitoring programmes with sufficient statistical power to detect significative changes in the Antarctic benthos

    Application of adaptive cluster sampling with a data-driven stopping rule to plant disease incidence

    No full text
    Plant pathologists need to manage plant diseases at low incidence levels. This needs to be performed efficiently in terms of precision, cost and time because most plant infections spread rapidly to other plants. Adaptive cluster sampling with a data-driven stopping rule (ACS*) was proposed to control the final sample size and improve efficiency of the ordinary adaptive cluster sampling (ACS) when prior knowledge of population structure is not known. This study seeks to apply the ACS* design to plant diseases at various levels of clustering and incidences levels. Results from simulation study show that the ACS* is as efficient as the ordinary ACS design at low levels of disease incidence with highly clustered diseased plants and is an efficient design compared with simple random sampling (SRS) and ordinary ACS for some highly to less clustered diseased plants with moderate to higher levels of disease incidence

    Chapter Profiling visitors of a national park in Italy through unsupervised classification of mixed data

    No full text
    Cluster analysis has for long been an effective tool for analysing data. Thus, several disciplines, such as marketing, psychology and computer sciences, just to mention a few, did take advantage from its contribution over time. Traditionally, this kind of algorithm concentrates only on numerical or categorical data at a time. In this work, instead, we analyse a dataset composed of mixed data, namely both numerical than categorical ones. More precisely, we focus on profiling visitors of the National Park of Majella in the Abruzzo region of Italy, which observations are characterized by variables such as gender, age, profession, expectations and satisfaction rate on park services. Applying a standard clustering procedure would be wholly inappropriate in this case. Therefore, we hereby propose an unsupervised classification of mixed data, a specific procedure capable of processing both numerical than categorical variables simultaneously, releasing truly precious information. In conclusion, our application therefore emphasizes how cluster analysis for mixed data can lead to discover particularly informative patterns, allowing to lay the groundwork for an accurate customers profiling, starting point for a detailed marketing analysis
    corecore